Goto

Collaborating Authors

 flop reduction



Channel Gating Neural Networks

Weizhe Hua, Yuan Zhou, Christopher M. De Sa, Zhiru Zhang, G. Edward Suh

Neural Information Processing Systems

Unlike static network pruning, channel gating optimizes CNN inference atrun-time byexploiting input-specific characteristics, which allows substantially reducing the compute cost with almost no accuracyloss.


Appendix RepresentationLearningProcess

Neural Information Processing Systems

Here we provide more experimental results. Specifically, we evaluate the representational similarity using the CKA value of the same layer from the model (ResNet-32) with different sparsity at each epoch and compare them with the final model. In our work, we evaluate four different types of freezing schemes (Sec. Inthis case, we can keep the single-shot & resume has the same FLOPs reduction asthe single-shot scheme, and the entire network can be fine-tuned at the end of training with a small learningrate. For the periodically freezing scheme, we let the selected layers freeze periodically with a given frequency so that all the layers/blocks are able to be updated at different stages of the training process.




Channel Gating Neural Networks

Weizhe Hua, Yuan Zhou, Christopher M. De Sa, Zhiru Zhang, G. Edward Suh

Neural Information Processing Systems

Unlike static network pruning, channel gating optimizes CNN inference at run-time by exploiting input-specific characteristics, which allows substantially reducing the compute cost with almost no accuracy loss. We experimentally show that applying channel gating in state-of-the-art networks achieves 2.7-8.0


TinyDrop: Tiny Model Guided Token Dropping for Vision Transformers

Wang, Guoxin, Wang, Qingyuan, Huang, Binhua, Chen, Shaowu, John, Deepu

arXiv.org Artificial Intelligence

ABSTRACT Vision Transformers (ViTs) achieve strong performance in image classification but incur high computational costs from processing all image tokens. To reduce inference costs in large ViTs without compromising accuracy, we propose Tiny-Drop, a training-free token dropping framework guided by a lightweight vision model. The guidance model estimates the importance of tokens while performing inference, thereby selectively discarding low-importance tokens if large vit models need to perform attention calculations. The framework operates plug-and-play, requires no architectural modifications, and is compatible with diverse ViT architectures. Evaluations on standard image classification benchmarks demonstrate that our framework reduces FLOPs by up to 80% for ViTs with minimal accuracy degradation, highlighting its generalization capability and practical utility for efficient ViT -based classification.




Layer Pruning with Consensus: A Triple-Win Solution

Mugnaini, Leandro Giusti, Duarte, Carolina Tavares, Costa, Anna H. Reali, Jordao, Artur

arXiv.org Artificial Intelligence

Layer pruning offers a promising alternative to standard structured pruning, effectively reducing computational costs, latency, and memory footprint. While notable layer-pruning approaches aim to detect unimportant layers for removal, they often rely on single criteria that may not fully capture the complex, underlying properties of layers. We propose a novel approach that combines multiple similarity metrics into a single expressive measure of low-importance layers, called the Consensus criterion. Our technique delivers a triple-win solution: low accuracy drop, high-performance improvement, and increased robustness to adversarial attacks. With up to 78.80% FLOPs reduction and performance on par with state-of-the-art methods across different benchmarks, our approach reduces energy consumption and carbon emissions by up to 66.99% and 68.75%, respectively. Additionally, it avoids shortcut learning and improves robustness by up to 4 percentage points under various adversarial attacks. Overall, the Consensus criterion demonstrates its effectiveness in creating robust, efficient, and environmentally friendly pruned models.